{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scatterplots in pydeck: A case study using Beijing subway stops\n",
    "\n",
    "*(About 5-10 minutes to read)*\n",
    "\n",
    "Since 1978, China's GDP has nearly doubled every seven years [[1](https://en.wikipedia.org/wiki/Historical_GDP_of_China)]. That sort of exponential growth has led to rapid internal change within the country&#8212;as demonstrated in part by the rapid changes within Beijing's urban infrastructure.\n",
    "\n",
    "Below we'll plot the location of Beijing subway stops over time. Locations for subway stops come from [Wikipedia](https://en.wikipedia.org/wiki/List_of_Beijing_Subway_stations) and [OpenStreetMap](https://wiki.openstreetmap.org/wiki/Beijing_Subway). Do note that this is not intended to be a rigorous study, so some stops may be missing.\n",
    "\n",
    "## Contents\n",
    "\n",
    "- [Getting the data](#Getting-the-data)\n",
    "- [Data cleaning](#Data-cleaning)\n",
    "- [Using pydeck to automatically create a viewport](#Automatically-generate-a-viewport)\n",
    "- [Plotting the data](#Plotting-the-data)\n",
    "- [Plotting the data over time](#Playing-the-data-forward-in-time)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting the data\n",
    "\n",
    "First, we can use the [Pandas library](https://pandas.pydata.org/) to download our data. You're likely already familiar with its existence–Pandas is likely the most popular library in Python for filtering, aggregating, and joining data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ast import literal_eval\n",
    "\n",
    "import pandas as pd\n",
    "from pydeck import (\n",
    "    data_utils,\n",
    "    Deck,\n",
    "    Layer\n",
    ")\n",
    "\n",
    "# First, let's use Pandas to download our data\n",
    "URL = 'https://raw.githubusercontent.com/ajduberstein/data_sets/master/beijing_subway_station.csv'\n",
    "df = pd.read_csv(URL)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data cleaning\n",
    "\n",
    "Next, we'll have to engage in some necessary data housekeeping:\n",
    "\n",
    "1. We have to re-code position to be one field and in a list.\n",
    "2. The CSV encodes the `[R, G, B, A]` color values a `str`, and `literal_eval` lets us convert that string a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We have to re-code position to be one field in a list, so we'll do that here:\n",
    "df['position'] = df.apply(lambda x: [x['lng'], x['lat']], axis=1)\n",
    "# The CSV encodes the [R, G, B, A] color values listed in it as a string\n",
    "df['color'] = df.apply(lambda x: literal_eval(x['color']), axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Automatically generate a viewport\n",
    "\n",
    "pydeck features some nifty utilities for visualizing data, like an **automatic zoom using `data_utils.autocompute_viewport`** for 2D data sets.\n",
    "\n",
    "We'll render the viewport, as well, just to verify that the visualization looks sensible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use pydeck's data_utils module to fit a viewport to the data\n",
    "viewport = data_utils.autocompute_viewport(points=df[['lng', 'lat']], view_proportion=0.9)\n",
    "auto_zoom_map = Deck(layers=None, initial_view_state=viewport)\n",
    "auto_zoom_map.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sure enough, we're centered to Beijing.\n",
    "\n",
    "# Plotting the data\n",
    "\n",
    "We'll render the data and use some Jupyter notebook functionality to provide a header with a year.\n",
    "\n",
    "It's worth spending some time on each line, if you haven't seen the Layer object yet:\n",
    "\n",
    "```python\n",
    "scatterplot = Layer(\n",
    "    'ScatterplotLayer',\n",
    "    df,\n",
    "    radius=500,\n",
    "    get_fill_color='color',\n",
    "    get_position='position')\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "**We can specify the layer type as the first argument, the data as the second, and the layer arguments as keywords.** **[`ScatterplotLayer`](https://github.com/uber/deck.gl/blob/master/docs/layers/scatterplot-layer.md)** is one of a list of layers available in the deck.gl core library. We'll also provide a header to list the year using some built-in Jupyter notebook tools.\n",
    "\n",
    "As aside, for a list of other layers, see the [deck.gl documentation](https://github.com/uber/deck.gl/tree/master/docs/layers#deckgl-layer-catalog-overview). Remember that deck.gl is a JavaScript library and not a Python one, so the documentation may differ for some kinds of terminology and functionality (e.g., pydeck doesn't support passing functions as arguments but this is a common occurrence within deck.gl)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from IPython.core.display import display\n",
    "import ipywidgets\n",
    "\n",
    "year = 2019\n",
    "\n",
    "scatterplot = Layer(\n",
    "    'ScatterplotLayer',\n",
    "    df,\n",
    "    id='scatterplot-layer',\n",
    "    radius=500,\n",
    "    get_fill_color='color',\n",
    "    get_position='position')\n",
    "r = Deck(layers=[scatterplot], initial_view_state=viewport)\n",
    "display_el = ipywidgets.HTML('<h1>' + str(year) + '</h1>')\n",
    "display(display_el)\n",
    "r.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Playing the data forward in time\n",
    "\n",
    "Finally, we can loop through the data and see the dramatic development in Beijing since 1971, as demonstrated by subway stop opening dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "for y in range(1971, 2020):\n",
    "    scatterplot.data = df[df['opening_date'] <= str(y)]\n",
    "    year = y\n",
    "    display_el.value = '<h1>' + str(year) + '</h1>'\n",
    "    r.update()\n",
    "    time.sleep(0.2)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
